Search CORE

22 research outputs found

An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture

Author: Chen Yuansi
Publication venue
Publication date: 12/01/2021
Field of study

We prove an almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. The lower bound has the dimension dependency

d^{-o_d(1)}

. When the dimension is large enough, our lower bound is tighter than the previous best bound which has the dimension dependency

d^{-1/4}

. Improving the current best lower bound of the isoperimetric coefficient in the KLS conjecture has many implications, including improvements of the current best bounds in Bourgain's slicing conjecture and in the thin-shell conjecture, better concentration inequalities for Lipschitz functions of log-concave measures and better mixing time bounds for MCMC sampling algorithms on log-concave measures.Comment: 25 pages, 1 figure, accepted in GAFA journa

arXiv.org e-Print Archive

Repository for Publications and Research Data

When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

Author: Chen Yuansi
Gatmiry Khashayar
Publication venue
Publication date: 10/04/2023
Field of study

We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on

\mathbb{R}^d

whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach

\epsilon

error in total variation distance from a warm start by

\tilde O(d^{1/4}\text{polylog}(1/\epsilon))

and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has

\tilde{O}(d^{1/2}\text{polylog}(1/\epsilon))

dimension dependency in Wu et al. (2022), we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Moreover, to deal with another bottleneck on the HMC proposal distribution overlap control in the literature, we provide a new approach to upper bound the Kullback-Leibler divergence between push-forwards of the Gaussian distribution through HMC dynamics initialized at two different points. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the applicability of our result, several examples of natural functions that fall into our framework are discussed.Comment: 42 page

arXiv.org e-Print Archive

Fast and Robust Archetypal Analysis for Representation Learning

Author: Chen Yuansi
Harchaoui Zaid
Mairal Julien
Publication venue
Publication date: 26/05/2014
Field of study

We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry

Author: Chen Yuansi
Gatmiry Khashayar
Publication venue
Publication date: 08/04/2023
Field of study

We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) for sampling a target density on

\mathbb{R}^d

. We assume that the target density satisfies

\psi_\mu

-isoperimetry and that the operator norm and trace of its Hessian are bounded by

L

and

\Upsilon

respectively. Our main result establishes that, from a warm start, to achieve

\epsilon

-total variation distance to the target density, MALA mixes in

O\left(\frac{(L\Upsilon)^{\frac12}}{\psi_\mu^2} \log\left(\frac{1}{\epsilon}\right)\right)

iterations. Notably, this result holds beyond the log-concave sampling setting and the mixing time depends on only

\Upsilon

rather than its upper bound

L d

. In the

m

-strongly logconcave and

L

-log-smooth sampling setting, our bound recovers the previous minimax mixing bound of MALA~\cite{wu2021minimax}.Comment: 16 page

arXiv.org e-Print Archive

Domain adaptation under structural causal models

Author: Bühlmann Peter
Chen Yuansi
Publication venue
Publication date: 01/11/2021
Field of study

Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been application-driven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions in our theory are violated.Comment: 80 pages, 22 figures, accepted in JML

arXiv.org e-Print Archive

Repository for Publications and Research Data

Fast MCMC sampling algorithms on polytopes

Author: Chen Yuansi
Dwivedi Raaz
Wainwright Martin J.
Yu Bin
Publication venue
Publication date: 01/01/2018
Field of study

We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former is based on volumetric-logarithmic barrier introduced by Vaidya whereas the latter uses John's ellipsoids. We show that the Vaidya walk mixes in significantly fewer steps than the logarithmic-barrier based Dikin walk studied in past work. For a polytope in

\mathbb{R}^d

defined by

n >d

linear constraints, we show that the mixing time from a warm start is bounded as

\mathcal{O}(n^{0.5}d^{1.5})

, compared to the

\mathcal{O}(nd)

mixing time bound for the Dikin walk. The cost of each step of the Vaidya walk is of the same order as the Dikin walk, and at most twice as large in terms of constant pre-factors. For the John walk, we prove an

\mathcal{O}(d^{2.5}\cdot\log^4(n/d))

bound on its mixing time and conjecture that an improved variant of it could achieve a mixing time of

\mathcal{O}(d^2\cdot\text{polylog}(n/d))

. Additionally, we propose variants of the Vaidya and John walks that mix in polynomial time from a deterministic starting point. The speed-up of the Vaidya walk over the Dikin walk are illustrated in numerical examples.Comment: 86 pages, 9 figures, First two authors contributed equall

arXiv.org e-Print Archive

eScholarship - University of California

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Author: Chen Yuansi
Schmidler Scott
Wu Keru
Publication venue
Publication date: 02/10/2022
Field of study

We study the mixing time of the Metropolis-adjusted Langevin algorithm (MALA) for sampling from a log-smooth and strongly log-concave distribution. We establish its optimal minimax mixing time under a warm start. Our main contribution is two-fold. First, for a

d

-dimensional log-concave density with condition number

\kappa

, we show that MALA with a warm start mixes in

\tilde O(\kappa \sqrt{d})

iterations up to logarithmic factors. This improves upon the previous work on the dependency of either the condition number

\kappa

or the dimension

d

. Our proof relies on comparing the leapfrog integrator with the continuous Hamiltonian dynamics, where we establish a new concentration bound for the acceptance rate. Second, we prove a spectral gap based mixing time lower bound for reversible MCMC algorithms on general state spaces. We apply this lower bound result to construct a hard distribution for which MALA requires at least

\tilde \Omega (\kappa \sqrt{d})

steps to mix. The lower bound for MALA matches our upper bound in terms of condition number and dimension. Finally, numerical experiments are included to validate our theoretical results.Comment: 63 pages, 2 figure

arXiv.org e-Print Archive

Recommended from our members

Fast MCMC algorithms, Stability and DeepTune

Author: Chen Yuansi
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Drawing samples from a known distribution is a core computational challenge common to many disciplines, with applications in statistics, probability, operations research, and other areas involving stochastic models. In statistics, sampling methods are useful for both estimation and inference, including problems such as estimating expectations of desired quantities, computing probabilities of rare events, gauging volumes of particular sets, exploring posterior distributions and obtaining credible intervals etc.Facing massive high dimensional data, both computational efficiency and good statistical guarantees are more and more important in modern statistical and machine learning applications. In this thesis, centered around sampling algorithms, we consider the fundamental questions on their computational and statistical guarantees: How to design a fast sampling algorithm and how long should it be run? What are the statistical learning guarantee of these algorithms? Are there any trade-offs between computation and learning?To answer these questions, first we start with establishing non-asymptotic convergence guarantees for popular MCMC sampling algorithms in Bayesian literature: Metropolized Random Walk, Metropolis-adjusted Langevin algorithm and Hamiltonian Monte Carlo. To address a number of technical challenges arise enroute, we develop results based on the conductance profile in order to prove quantitative convergence guarantees general continuous state space Markov chains. Second, to confront a large class of constrained sampling problems, we introduce two new algorithms, Vaidya and John walks, to sample from polytope-constrained distributions with convergence guarantees. Third, we prove fundamental trade-off results between statistical learning performance and convergence rate of any iterative learning algorithm, including sample algorithms. The trade-off results allow us to show that a too stable algorithm can not converge too fast, and vice-versa. Finally, to help neuroscientists analyze their massive amount of brain data, we develop DeepTune, a stability-driven visualization and interpretation framework via optimization and sampling for the neural-network-based models of neurons in visual cortex

eScholarship - University of California